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Abstract. Let i be a complex random variable with mean zero and bounded variance. Let N n be 
the random matrix of size n whose entries are iid copies of x and M be a fixed matrix of the same 
size. The goal of this paper is to give a general estimate for the condition number and least singular 
value of the matrix M + N n , generalizing an earlier result of Spielman and Teng for the case when 
x is gaussian. 

Our investigation reveals an interesting fact that the "core" matrix M does play a role on tail 
bounds for the least singular value of M + N n . This does not occur in Spielman- Teng studies when 
x is gaussian. Consequently, our general estimate involves the norm ||Af [|. In the special case when 
|M|| is relatively small, this estimate is nearly optimal and extends or refines existing results. 



1. Introduction 

Let M be an nxn matrix and s±(M) > • • • > s n (M) its singular values. The condition number of A, 
as denned by numerical analysts, is 



k(M) := Sl (M)/s n (M) = \\M\\\\M- 1 \\. 

This parameter is of fundamental importance in numerical linear algebra and related areas, such as 
linear programming. In particular, the value 



L(M) := logK(M) 

measures the (worst case) lost of precision the equation Mx = b can exhibit [22, 2]. 

The problem of understanding the typical behavior of k(M) and L(M) when the matrix M is random 
has a long history. This was first raised by von Neuman and Goldstinc in their study of numerical 
inversion of large matrices [31]. Several years later, the problem was restated in a survey of Smale [22] 
on the efficiency of algorithm of anaylsis. One of Smale's motivations was to understand the efficiency 
of the simplex algorithm in linear programming. The problem is also at the core of Demmel's plan 
about the investigation of the probability that a numerical analysis problem is difficult [8] (see also 
[19] for a work that inspires this investigation). 

To make the problem precise, the most critical issue is to choose a probability distribution for M. A 
convenient model has been random matrices with independent gaussian entries (either real of complex) . 
An essential feature of this model is that here the joint distribution of the eigenvalues can be written 
down precisely 
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n 

(1) (Real Gaussian) C\(n) J j |Aj — Aj| exp(— A?/2). 

\<i<j<n i—1 



n 

(2) (Complex Gaussian) c 2 (n) ~\\ |A, — Xj\ 2 cxp(— ^ A^/2). 

l<i<j<n i—1 



Here c\ (n) , c 2 (n) are normalization factors whose explicit formulae can be seen in, for example, [17]. 

Most questions about the spectrum of these random matrices can then be answered by estimating 
a properly defined integral with respect to these measures. Many advanced techniques have been 
worked out to serve this purpose (see, for instance [17]). In particular, the condition number is well 
understood, thanks to works of Kostlan, Oceanu [22, 13], Edelman [6] and many others (see Section 
2). 

The gaussian model, however, has serious shortcomings. As pointed out by many researchers (see, for 
example [3, 24]), the gaussian model does not reflex the arbitrariness of the input. Let us consider, for 
example, a random matrix with independent real gaussian entries. By sharp concentration results, one 
can show that the fraction of entries with absolute values at most 1, is, with overwhelming probability, 
close to the absolute constant ^= cxp(— t 2 /2)dt. Many classes of matrices that occur in practice 
just simply do not posses this property. This problem persists even when one replaces gaussian by 
another fixed distribution, such as Bernoulli. 

About 10 years ago, Spielman and Teng [24, 25], motivated by Demmel's plan and the problem of 
understanding the efficiency of the simplex algorithm proposed a new, exciting distribution. Spielman 
and Teng observed that while the ideal input maybe a fixed matrix M, it is likely that the computer 
will work with a perturbation M + N, where N is a random matrix representing random noise. Thus, 
it raised the issue of studying the distribution of the condition number of M + N. This problem 
is at the heart of the so-called Spielman- Teng smooth analysis. (See [24, 25] for a more detailed 
discussion and [3, 4, 5, 26, 9] for many related works on this topics.) Notice that the special case 
M = corresponds to the setting considered in the previous paragraphs. 

Spielman- Teng model nicely addresses the problem about the arbitrariness of the inputs, as in this 
model every matrix generates a probability space of its own. In their papers, Spielman and Teng 
considered mostly gaussian noise (in some cases they also considered other continuous distributions 
such as uniform on [—1,1]). However, in the digital world, randomness often does not has gaussian 
nature. To start with, all of real data are finite. In fact, in many problems (particularly those in 
integer programming) all entries of the matrix are integers. The random errors made by the degital 
devices (for example, sometime a bit gets flipped) are obviously of discrete nature. In other problems, 
for example those in engineering, the data may contain measurements where it would be natural to 
assume gaussian errors. On the other hand, data are usually strongly truncated. For example, if an 
entry of our matrix represents the mass of an object, then we expect to see a number like 12.679 (say, 
tons), rather than 12.6792347043641259. Thus, instead of the gaussian distribution, we (and/or our 
computers) often work with a discrete distribution, whose support is relatively small and does not 
depend on the size of the matrix. (A good toy example is random Bernoulli matrix, whose entries 
takes values ±1 with probability half.) This leads us to the following question 
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Question. (Smooth analysis of the condition number) Estimate the condition number of a random 
matrix M n := M + N n , where M is a fixed matrix of size n, and N n a general random matrix ? 

The goal of this paper is to investigate this question, where, as a generalization of Spiclman-Tcng 
model, we think of N n as a matrix with independent random entries which (instead as being gaussian) 
have arbitrary distributions. Our main result will show that with high probability, M n is well- 
conditioned. This result could be useful in further studies of smooth analysis in linear programming. 
The Spielman-Teng smooth analysis of the simplex algorithm [24, 25] was done with gaussian noise. 
It is a natural and (from the practical point of view) important question to repeat this analysis with 
discrete noise (such as Bernoulli). This question was posed by Spielman to the authors few years 
ago. The paper [24] also contains a specific conjecture on the least singular value of random Bernoulli 
matrix. 

In connection, we should mention here a recent series of papers by Burgisser, Cucker and Lotz [3, 4, 5], 
which discussed the smooth analysis of condition number under a somewhat different setting (they 
considered the notion of conic condition number and a different kind of randomness). 

Before stating mathematical results, let us describe our notations. We use the usual asymptotic 
notation X = 0(Y) to denote the estimate |X| < CY for some constant C > (independent of 
n); X — £l(Y) to denote the estimate X > cY for some c > independent of n, and X = Q(Y) 
to denote the estimates X = 0(Y) and X = fl(Y) holding simultaneously. In some cases, we write 
X < Y instead of X = 0(Y) and 1 » F instead of X = Q(Y). Notations such as X = O x , b {Y) 
or X <SC a ,fc 00 mean that the hidden constant in O or <C depend on previously defined constants 
a and b. We use o(l) to denote any quantity that goes to zero as n — > oo. X = o(Y) means that 
X/Y = o(l). 

Recall that 

k(M) := Sl (M)/s n (M) = IIMIHIM- 1 !!. 

Since ||M|| 2 > \ m ij\ 2 l n (where my denote the entries of M) it is expected that ||M|| = n° (1) . 
Following the literature, we say that M is well-conditioned (or well-posed) if k(M) = or (equiv- 

alently) L(M) = O(logn). 

By the triangle inequality, 

||M||-|K||<||M + 7V n ||<||M|| + ||7V n ||. 

Under very general assumptions, the random matrix N n satisfies \\N n \\ = with overwhelming 

probability (see many estimates in Section 3). Thus, in order to guarantee that \\M + N n \\ is well- 
conditioned (with high probability), it is natural to assume that 



(3) ||M||=n «. 



This is not only a natural, but fairly safe assumption to make (with respect to the applicability of our 
studies). Most large matrices in practice satisfy this assumption, as their entries are usually not too 
large compared to their sizes. 
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Our main result shows that under this assumption and a very general assumption on the entries of 
N n , the matrix M + N n is well-conditioned, with high probability. This result extends and bridges 
several existing results in the literature (see next two sections). 

Notice that under assumption (3), if we want to show that M + N n is typically well-conditioned, it 
suffices to show that 

IKM + Ag-i = a n (M + N n )- 1 =n < 1 ' 

with high probability. Thus, we will formulate most results in a form of a tail bound for the least 
singular value of M + N n . The typical form will be 

P(s n (M + N n ) <n- B ) <n- A 

where A, B are positive constants and A increases with B. The relation between A and B is of 
importance and will be discussed in length. 



2. Previous results 

Let us first discuss the gaussian case. Improving results of Kostlan and Oceanu [22], Edelman [6] 
computed the limiting distribution of y/ns n (N n ) when N n is gaussian. His result implies 

Theorem 2.1. There is a constant C > such that the following holds. Let x be the real gaussian 
random variable with mean zero and variance one, let N n be the random matrix whose entries are iid 
copies of x. Then for any constant t > 

P(s n (N n ) <t)< n x lH. 

Concerning the more general model M + N n , Sankar, Spiclman and Teng proved [26] 

Theorem 2.2. There is a constant C > such that the following holds. Let x be the real gaussian 
random variable with mean zero and variance one, let N n be the random matrix whose entries are iid 
copies of x, and let M be an arbitrary fixed matrix. Let M n := M + N n . Then for any t > 

P(*„(M n ) < t) < Cn l ' 2 t. 

Once we give up the gaussian assumption, the study of the least singular value s n becomes much harder 
(in particular for discrete distributions such as Bernoulli, in which x = ±1 with equal probability 1/2). 
For example, it is already non-trivial to prove that the least singular value of a random Bernoulli matrix 
is positive with probability 1 — o(l). This was first done by Komlos in 1967 [14], but good quantitative 
lower bounds were not available until recently. In a series of papers, Tao-Vu and Rudclson-Vcrshynin 
addressed this question [27, 29, 20, 21] and proved a lower bound of the form n~ e W for s n with high 
probability. 

We say that x is subgaussian if there is a constant B > such that 

P(M > t) < 2cxp(-t 2 /B 2 ) 

for all t > 0. The smallest B is called the subgaussian moment of x. The following is a corollary of a 
more general theorem by Rudelson and Vcrshynin [21, Theorem 1.2] 
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Theorem 2.3. Let x be a subgaussian random variable with zero mean, variance one and subgaussian 
moment B and A be an arbitrary positive constant. Let N n be the random matrix whose entries are 
iid copies of x. Then there is a positive constant C (depending on B) such that for any t > n~ A we 
have 

P{s n (N n ) <t)< CrVH. 

We again turn to the general model M + N n . In [29], the present authors proved 

Theorem 2.4. [29, Theorem 2.1] Let x be a random variable with non-zero variance. Then for any 
constants A,C > there exists a constant B > (depending on A,C, x) such that the following holds. 
Let N n be the random matrix whose entries are iid copies of x, and let M be any deterministic nx n 
matrix with norm \\M\\ < n° ' . Then 

P(s n (M + N n )<n- B )<n- A . 

Notice that this theorem requires very little about the variable x. It does not need to be sub-gaussian 
nor even has bounded moments. All we ask is that the variance is bounded from zero, which basically 
means x is indeed "random" . Thus, it guarantees the well-conditionness of M + N n in a very general 
setting. 

The weakness of this theorem is that the dependence of B on A and C, while explicit, is too generous. 
The main result of this paper, Theorem 3.2, will improve this dependence significantly and provide a 
common extension of Theorem 2.4 and Theorem 2.3. 



3. Main result 

As already pointed out, an important point is the relation between the constants A, B in a bound of 
the form 



F(s n (M + N n ) <n- B ) <n- A . 

In Theorem 2.2, we have a simple (and optimal) relation B = A + 1/2. It is natural to conjecture 
that this relation holds for other, non-gaussian, models of random matrices. In fact, this conjecture 
was our starting point of this study. Quite surprisingly, it turns out not to be the case. 

Theorem 3.1. There are positive constants C\ and ci such that the following holds. Let N n be the 
n x n random Bernoulli matrix with n even. For any L > n, there is an n x n deterministic matrix 
M such that \\M\\ = L and 

P(s n (M + N n )<c 1 j)>c 2 n- 1 / 2 . 

The assumption n is even is for convenience and can easily be removed by replacing the Bernoulli 
matrix by a random matrix whose entries take values 0, ±1 with probability 1/3 (say). Notice that if 
L = n D for some constant D then we have the lower bound 



P(s„(M + N n ) < Cl n- D+1 ) > C2U- 1 / 2 , 
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which shows that one cannot expect Theorem 2.2 to hold in general and that the norm of M should 
play a role in tail bounds of the least singular value. 

The main result of this paper is the following. 

Theorem 3.2. Let x be a random variable with mean zero and bounded second moment, and let 
7 > 1/2, A > be constants. Then there is a constant c depending on x,j,A such that the following 
holds. Let N n be the random matrix of size n whose entries are iid copies of x, M be a deterministic 
matrix satisfying ||M|| < n 1 , and let M n := M + N n . Then 

P(«n(M„) < n^ 2A+1 ^) < c(n- A+ °W +P(\\N n \\ > n*)). 

Note that this theorem only assumes bounded second moment on x. The assumption that the entries 
of N n are iid is for convenience. A slightly weaker result would hold if one omit this assumption. 

Corollary 3.3. Let x be a random variable with mean zero and bounded second moment, and let 
7 > 1/2, A > be constants. Then there is a constant c 2 depending on x, 7, A such that the following 
holds. Let N n be the random matrix of size n whose entries are iid copies of x, M be a deterministic 
matrix satisfying \\M\\ < n 1 , and let M n := M + N n . Then 

P(«(M n ) > 2n {2A+2 ^) < c(n- A+0 ^ +-p{\\N n \\ > n<)). 

Proof. Since k(M„) = Si(M n )/s n (M n ), it follows that if k(M„) > n (2A+2 ^ , then at least one of the 
two events s n {M n ) < n - {2A+1 ^ and Si(M n ) > 2n< holds. On the other hand, 

si{M n ) < «i(M) + si(N n ) - ||M|| + ||JV n || <n' + \\N n \\. 

The claim follows. □ 

In the rest of this section, we deduce a few corollaries and connect them with the existing results. 

First, consider the special case when x is subgaussian. In this case, it is well-known that one can have 
a strong bound on P(||7V„|| > n 1 ) thanks to the following theorem (see [21] for references) 

Theorem 3.4. Let B be a positive constant. There are positive constants C\,C2 depending on B 
such that the following holds. Let x be a subgaussian random variable with zero mean, variance one 
and subgaussian moment B and N n be the random matrix whose entries are iid copies of x. Then 

P(||Ag| > Cm 1 / 2 ) < exp(-C 2 n). 

If one replaces the subgaussian condition by the weaker condition that x has forth moment bounded 
B, then one has a weaker conclusion that 

E(||JV n ||) < Cm 1 / 2 . 

From Theorem 3.2 and Theorem 3.4 we see that 
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Corollary 3.5. Let A and 7 be arbitrary positive constants. Let x be a subgaussian random variable 
with zero mean and variance one and N n be the random matrix whose entries are iid copies of x. Let 
M be a deterministic matrix such that \\M\\ < n 7 and set M n = M + N n . Then 



(4) P(* n (M„) < (n 1 / 2 + IIMII)-^" 1 ) < n- A +°^. 

In the case ||M|| = Oin 1 / 2 ) (which of course includes the M = special case), (4) implies 

Corollary 3.6. Let A be arbitrary positive constant. Let x be a subgaussian random variable with 
zero mean and variance one and N n be the random matrix whose entries are iid copies of x. Let M 
be a deterministic matrix such that ||M|| = (^(n 1 / 2 ) and set M n = M + N n . Then 



(5) P(s„(M„) < n-*- 1 ' 2 ) < n- A+o(1 K 

Up to a loss of magnitude n°^ x \ this matches Theorem 2.3, which treated the base case M = 0. 

If we assume bounded fourth moment instead of subgaussian, we can use the second half of Theorem 
3.4 to deduce 

Corollary 3.7. Let x be a random variable with zero mean, variance one and bounded forth moment 
moment and N n be the random matrix whose entries are iid copies of x. Let M be a deterministic 
matrix such that \\M\\ = n°W and set M n = M + N n . Then 



(6) P(*n(M„) < (n 1 / 2 + \\M\\)- 1+ °V) = o(l). 

In the case ||M|| = (^(n 1 / 2 ), this implies that almost surely s n (M n ) > n _1 / 2+0 ^ 1 '. For the special case 
M = 0, this matches (again up to the o(l) term) Theorem [21, Theorem 1.1]. 

Let us now take a look at the influence of ||M|| on the bound. Obviously, there is a gap between (4) 
and Theorem 3.1. On the other hand, by setting A = 1/2, L = n 1 and assuming that P(||7V„|| > n 7 ) 
is negligible (i.e., super-polynomially small in n), we can deduce from Theorem 3.2 that 

P(s n (M n )<c 1 L- 2 )<c 2 n- 1 / 2 +°W. 

This, together with Theorem 3.1, suggests that the influence of ||M|| in s n (M n ) is of polynomial type. 

In the next discussion, let us normalize and assume that x has variance one. One can deduce a bound 
on \\N n \\ from the simple computation 

E||iV n || 2 <EtriV n iV:=n 2 . 

By Chebyshev's inequality we thus have 

P(IK|| >n 1+A ' 2 ) <n~ A 

for all A>0. 
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Applying Theorem 3.2 we obtain 

Corollary 3.8. Let x be a random variable with mean zero and variance one and N n be the random 
matrix whose entries are iid copies of x. Then for any constant A > 



In particular, s n {N n ) > n^ 1 ^ ^ almost surely. 



It is clear that one can obtain better bounds for s n , provided better estimates on ||iV n ||. The idea 
of using Chebyshev's inequality is very crude (we just like to give an example) and there are more 
sophisticated tools. One can, for instance, use higher moments. The expectation of a fc-th moment 
can be expressed a sum of many terms, each correspond to a certain closed walk of length k on the 
complete graph of n vertices (see [12, 32]). If the higher moments of N n (while not bounded) do 
not increase too fast with n, then the main contribution in the expectation of the kth moment still 
come from terms which correspond to walks using each edge of the graph cither and 2 times. The 
expectation of such a term involves only the second moment of the entries in N n . The reader may 
want to work this out as an exercise. 

One can also use the following nice estimate of Seginer [23] 



E4)- 

i=i 

The rest of the paper is organized as follows. In the next section, we prove Theorem 3.1. The 
remaining sections are devoted for the proof of Theorem 3.2. This proof combines several tools that 
have been developed in recent years. It starts with an e-net argument (in the spirit of those used in 
[27, 20, 29, 21]. Two important technical ingredients are Theorem 6.8 from [29] and Lemma 9.1 from 
[21]. 



E||JV„ 



= 0(E max A 

l<i<n \ 



E4 



+ E max 



1<7 



4. Theorem 3.1: The influence of M 



Let M' be the n — 1 x n matrix obtained by concatenating the matrix LI n _\ with an all L column, 
where L is a large number (we will set L > n). The n x n matrix M is obtained from M' by adding 
to it a (first) all zero row; thus 

/0 ... 0\ 



M = 




L 



\0 



L LJ 



It is easy to see that 



||M|| = e(L). 

Now consider M n := M + N n where the entries of N n are iid Bernoulli random variables. 

P(s n (M n ) « n 1 '^- 1 ' 2 ) » n- 1 / 2 . 
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Let M' n be the (random) (n-l)xn matrix formed by the last n — 1 rows of M n . Let v € R n be a 
unit normal vector of the n — 1 rows of M^. By replacing v with — v if necessary we may write v in 
the form 



/ll 1 -1 

v = + oi, + a 2 , . . . , —j= + a n -i, —= + a n 

\vn V n v n v n 



where —X + a n < 0. 
V™ — 



Let £i be iid Bernoulli random variables. Multiplying v with the first row of M' n , we have 

= (L + £i)(-^= + oi) + (L + + a n ) 



= L(oi + a„) + ((£l - £„) + ^101 + £ n O n ) • 



Since |a»| = 0(1), it follows that |oi + a n \ = O(^). Repeating the argument with all other rows, we 
conclude that |oj + a n \ = O(j^) for all 1 < i < n — 1. 



Since v has unit norm, we also have 

71-1 



-1 

7= + a n I • 

, . . > . ,yn 

2=1 

which implies that 

2 ™ 
: (oi H h o„_i - a n )+^2 a i = °- 



i=l 



This, together with the fact that |aj + o„| = 0( ^) and all 1 < i < n — 1, yields 



^-2no„(- 7 = + z) = 0(^ + z , ) . 



Since — + a„ < and £ > n, it is easy to show from here that |o„| = O(j-). It follows that 
|oi| = 0(1) for all 1 < i < n. 



Now consider 



\M n v\\ 



1 1 

V](-7= + a»)& + ( 7= + On)Cr, 



Since n is even, with probability ©(^=)> £i + • • • + £n-i — 6n = 0, and in this case 



\M n v\\ 



= 



(i). 



as desired. 
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5. Controlled moment 

It is convenient to establish some more quantitative control on x. We recall the following notion from 
[29]. 

Definition 5.1 (Controlled second moment). Let k > 1. A complex random variable x is said to 
have K-controlled second moment if one has the upper bound 

E|a;| 2 < k 

(in particular, |Ex| < k 1 / 2 ), and the lower bound 

(7) ERc(zx - wfl{\x\ <k)> -Re{zf 

K 

for all complex numbers z, w. 

Example The Bernoulli random variable (P(a; = +1) = P(a; = —1) = 1/2) has 1-controlled second 
moment. The condition (7) asserts in particular that x has variance at least -, but also asserts that 
a significant portion of this variance occurs inside the event |a;| < k, and also contains some more 
technical phase information about the covariance matrix of Re(x) and Im(x). 

The following lemma was established in [29]: 

Lemma 5.2. [29, Lemma 2.4] Let x be a complex random variable with finite non-zero variance. Then 
there exists a phase e l6 and a k > 1 such that e l8 x has K-controlled second moment. 

Since rotation by a phase does not affect the conclusion of Theorem 3.2, we conclude that we can 
assume without loss of generality that x is K-controlled for some k. This will allow us to invoke several 
estimates from [29] (e.g. Lemma 6.2 and Theorem 6.8 below). 

Remark 5.3. The estimates we obtain for Theorem 3.2 will depend on k but will not otherwise depend 
on the precise distribution of x. It is in fact quite likely that the results in this paper can be generalised 
to random matrices N n whose entries are independent and are all K-controlled for a single k, but do 
not need to be identical. In order to simplify the exposition, however, we focus on the iid case. 

6. Small ball bounds 

In this section we give some bounds on the small ball probabilities P(|£ii>i + • • • + £, n v n — z\ < e) under 
various assumptions on the random variables ^ and the coefficients Uj. As a consequence we shall be 
able to obtain good bounds on the probability that Av is small, where A is a random matrix and v is 
a fixed unit vector. 

We first recall a standard bound (cf. [29, Lemmas 4.2, 4.3, 5.2]): 

Lemma 6.1 (Fourier-analytic bound). Let £i,...,£ n be independent variables. Then we have the 
bound 

~ n 

P(|£i«i + • • • + ZnVn -z\<r) « r 2 / exp(-9(^ \\wvj\\))) dw 

JweC:\w\<l/r J = 1 

for any r > and z g C, and any unit vector v = (vi, . . . , v n ), where 

(8) ||z||,:=(E!|Rc(z(0-^))ll^ /z ) 1/2 , 

Q is an independent copy of £j , and |]x|| R / z denotes the distance from x to the nearest integer. 
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Proof. By the Esseen concentration inequality (see e.g. [30, Lemma 7.17]), we have 

Pfl&vi + • • • + Znv n - A < r) < r 2 / |E(e(Re(w(a^i + • • ■ + £ n v n ))))\ dw 

Jw£C:\w\<l/r 

for any c > 0, where e(x) := e 2 ™. We can write the right-hand side as 



r 2 



n/i(^o i/2 ^ 

GC:|w|<l/r J = 1 



where 

/,•(*) := |E(e(Re&z)))| 2 = E cos(2^Rc(zfe- - £.))). 
Using the elementary bound cos(27T0) < 1 — 9(||^||r/ z ) we conclude 

<l-0(H?)<exp(-e(W?)) 
and the claim follows. □ 

Next, we recall some properties of the norms \\z\\j in the case when is K-controlled. 
Lemma 6.2. Let 1 < j < n, Zei £j 6e a random variable, and let \\\\j be defined by (8). 

(i) For any aieC,0< HHIj — 1 II — Hlj = IIHIj- 

(ii) For any z, w £ C, \\z + w\\j < \\z\\j + \\w\\j. 

(hi) If£j is k- controlled for some fixed k, then for any sufficiently small positive constants Co, c\ > 
we have \\z\\j > CiRe(z) whenever \z\ < c . 

Proof. See [29, Lemma 5.3]. □ 

We now use these bounds to estimate small ball probabilities. We begin with a crude bound. 

Corollary 6.3. Let £i, . . . ,£„ be independent variables which are n-controlled. Then there exists a 
constant c > such that 

(9) P(|6»i+- + U-2|<c)<l-c 

for all z e C and all unit vectors (v\, . . . , v n ). 

Proof. Let c > be a small number to be chosen later. We divide into two cases, depending on 
whether all the Vi are bounded in magnitude by y/c or not. 

Suppose first that \vi\ < \fc for all c. Then we apply Lemma 6.1 (with r := c 1 / 4 ) and bound the 
left-hand side of (9) by 

«c 1/2 / expt-e^H^H 2 ))^. 

Jl«eC:|«)|<c- 1 /4 

By Lemma 6.2, if c is sufficiently small then we have ||wvj||j > CiHe(uiVj), for some positive constant 
c\. Writing each Vj in polar coordinates as Vj — r^e 27 ™ 6 ^' , we thus obtain an upper bound of 

« c 1/2 / exp(-e(Vr 2 Re(e 27r ^ W ) 2 )) dw. 

itu£C:|«j|<c- 1 /4 . = 1 

Since Y^=i r j = 1> we can use Holder's inequality (or Jensen's inequality) and bound this from above 
by 

<supc 1/2 f exp(-e(Rc(e 27ri ^u;) 2 )) dw 
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which by rotation invariance and scaling is equal to 

exp(-e(<T 1/4 Rc(w) 2 )) dw. 



I 



lweC:\w\<l 

From the monotone convergence theorem (or direct computation) we see that this quantity is less 
than 1 — c if c is chosen sufficiently small. (If necessary, we allow c to depend on the hidden constant 
in 9.) 

Now suppose instead that |ui| > \fc (say). Then by freezing all of the variables £2, ■ ■ ■ ,£n, we can 
bound the left-hand side of (9) by 

supP(|£i-w|< y/c). 

w 

But by the definition of ^-control, one easily sees that this quantity is bounded by 1 — c if c is sufficiently 
small (compared to 1/k), and the claim follows. □ 



As a consequence of this bound, we obtain 

Theorem 6.4. Let N n be an n x n random matrix whose entries are independent random variables 
which are all n-controlled for some constant k > 0. Then there are positive constants c, c' such that 
the following holds. For any unit vector v and any deterministic matrix M , 

P(||(M + N n )v\\ < en 1 ' 2 ) < exp(-c'n). 



Proof. Let c be a sufficiently small constant, and let X\, . . . ,X n denote the rows of M + N n . If 
|| (M + iV n )u|| < cn 1 / 2 , then we have |(X,-,u)| < c for at least (1 — c)n rows. As the events Ij := 
\(Xj,v)\ < c are independent, we see from the Chernoff inequality (applied to the sum J2j^j °f 
indicator variables) that it suffices to show that 

E(I j ) = P(\(X j ,v)\<c)<l-2c 

(say) for all j. But this follows from Corollary 6.3 (after adjusting c slightly), noting that each Xj is 
a translate (by a row of M) of a vector whose entries are iid copies of x. □ 

Now we obtain some statements of inverse Littlewood-Offord type. 

Definition 6.5 (Compressible and incompressible vectors). For any a, b > 0, let Comp(a, b) be the 
set of unit vectors v such that there is a vector v' with at most an non-zero coordinates satisfying 
\\v — v'\\ < b. We denote by Incomp(a, b) the set of unit vectors which do not lie in Comp(a, b). 

Definition 6.6 (Rich vectors). For any e, p > 0, let S e , p be the set of unit vectors v satisfying 

supP(|A • v - z\ < e) > p, 
zee 

where X = (x\, . . . , x n ) is a vector whose coefficients are iid copies of x. 
Lemma 6.7 (Very rich vectors are compressible). For any e, p > we have 

S £ , p cComp(o(^),0(-)V 
V np 2 p J 

Proof. We can assume p 3> n^ 1 / 2 since the claim is trivial otherwise. Let v <E S StP , thus 

P(\X-v-z\ <e)>p 
for some z. From Lemma 6.1 we conclude 

^ n 

(10) e 2 exp(-e(^||wwj 2 )) dw ^> p. 

JweC-.lwlKs- 1 =1 
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Let s > be a small constant (independent of n) to be chosen later, and let A denote the set of indices 
i for which \vi\ > se. Then from (10) we have 

e 2 / exp(— 9(Y^ ll^jllj)) dw > p. 

JweC:\w\<e-i jeA 

Suppose A is non-empty. Applying Holder's inequality we conclude that 

' [ exp(-9(| AIII^H 2 )) dw^p 

J weC-.yw^e- 1 

for some j G A. By the pigeonhole principle, this implies that 

(11) |{to€C: \w\ < £-\ lAIII^II 2 < k}\ > k 1/2 e~ 2 p 

for some integer k > 1. 

If \A\ <C k, then the set in (11) has measure 6(e~ 2 ), which forces \A\ <C p~ 2 ■ Suppose instead that 
k < s\A\ for some small s' > 0. Since \vj\ > se, we have s'/lvjl < s' /se. We will choose s' sufficiently 
small to make sure that this ratio is smaller than the constant cq in Lemma 6.2. By Lemma 6.2, we see 
that the intersection of the set in (11) with any ball of radius s'/\ Vj \ has density at most \fkj\A\, and 
so by covering arguments we can bound the left-hand side of (11) from above by <C k x l 2 \A\~ x l 2 e~ 2 . 
Thus we have \A\ <C p~ 2 in this case also. Thus we have shown in fact that \A\ <C p~ 2 in all cases 
(the case when A is empty being trivial). 

Now we consider the contribution of those j outside of A. From (10) and Lemma 6.2 we have 

e 2 j exp(-6(^ Re(wvj) 2 )) dw > p. 

JtoGC:|u.|<e- 1 jg A 

Suppose that A is not all of {1, ... , n}. Using polar coordinates Vj = rje 27rl6 i as before, we see from 
Holder's inequality that 

/ exp(-e(r 2 Re(u;e 2T ™^ ) 2 )) dw > p 

for some j $ A, where r 2 := r j • After scaling and rotation invariance, we conclude 

2 



.2 



/ exp(-9(^Re(u;) 2 )) dw > p. 

JweC:\w\<l £ 



The left-hand side can be computed to be at most 0(e/r). We conclude that r«e//). If we let v' 
be the restriction of v to A, we thus have \\v — v'\\ <C e/p, and the claim v e Comp(0(^i 3 -), O(^)) 
follows. (The case when A = {1, . . . , n} is of course trivial.) □ 



Roughly speaking, Lemma 6.7 gives a complete characterization of vectors v such that 

supP(|X -v-z\<e)>p, 
zee 

where p > CwT 1 / 2 , for some large constant C. The lemma shows that such a vector v can be 
approximated by a vector v' with at most ^ non-zero coordinates such that \\v — v'\\ < where 
C, C are positive constants. 

The dependence of parameters here are sharp, up to constant terms. Indeed, in the Bernoulli case, 
the vector v = (1, ... ,1,0, ... ,0) consisting of k Is lies in S Q q^, m and lies in Comp(a, 0) precisely 

when an > k (cf. [7]). This shows that the O(^-j) term on the right-hand side cannot be improved. 
On the other hand, in the Gaussian case, observe that if ||u|| < b then X ■ v will have magnitude O(e) 
with probability 0(e/b), which shows that the term O(-) cannot be improved. 
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Lemma 6.7 is only non-trivial in the case p > Cn x / 2 , for some large constant C. To handle the case 
of smaller p, we use the following more difficult entropy bound from [29]. 

Theorem 6.8 (Entropy of rich vectors). For any s,p, there is a finite set S' £ „ of size at most 
n -(i/2-o{i))np-n _|_ ex p( ( n )) sucn fhat for each v E S e>p , there is v' E S' e p such that \\v — t/||oo < £■ 

Proof. See [29, Theorem 3.2]. □ 

7. Proof of Theorem 3.2: preliminary reductions 

We now begin the proof of Theorem 3.2. Let N n , M, 7, A be as in that theorem. As remarked in 
Section 5, we may assume x to be K-controlled for some k. We allow all implied constants to depend 
on k, 7, A. We may of course assume that n is large compared to these parameters. We may also 
assume that 

(12) P(||Agi > m) < \ 

since the claim is trivial otherwise. By decreasing A if necessary, we may furthermore assume that 

(13) P(\\N n \\>n'>) <n- A+o(1 K 

It will then suffice to show (assuming (12), (13)) that 

P(s n (M n ) < n-( 2A+1 ^) « n - A+a+ °W 

for any constant a > (with the implied constants now depending on a also), since the claim then 
follows by sending a to zero very slowly in n. 

Fix a, and allow all implied constants to depend on a. By perturbing A and a slightly we may assume 
that A is not a half-integer; we can also take a to be small depending on A. For example, we can 
assume that 



(14) a < {2A}/2 
where {2A} is the fractional part of 2A. 

Using the trivial bound ||iV„|| > sup 1<i - <n \x%j\, we conclude from (12), (13) that 

~P(\xij\ > n' for some < min(i, n~ A+o{1) ). 

Since Xij are iid copies of x, the n 2 events \xij\ > n 1 are independent with identical probability. It 
follows that 

(15) P(|x| >n<) <n- A - 2+o(1 K 

Let F be the event that s n (M n ) < n~( 2A+1 ) 7 , and let G be the event that ||AT n || < n 1 . In view of 
(13), it suffices to show that 

P(FAG) < n- A+a+o{1 \ 

Set 

(16) b := /3n 1/2 " 7 
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and 



(17) 



a := 



logn' 



where (3 is a small positive constant to be chosen later. We then introduce the following events: 

• -Fbomp is the event that ||M„u|| < n~( 2A+1 ) 7 for some v € Comp(a, b). 

• ^incomp is the event that ||M„w|| < n^^ 2A+1 ^ for some v € Incomp(a, b). 

Observe that if F holds, then at least one of Fc omp and -Fincomp holds. Theorem 3.2 then follows 
immediately from the following two lemmas. 

Lemma 7.1 (Compressible vector bound). If (3 is sufficiently small, then 



In these lemmas we allow the implied constants to depend on (3. 

The proof of Lemma 7.1 is simple and will be presented in the next section. The proof of Lemma 7.2 
is somewhat more involved and occupies the rest of the paper. 



If -Fbomp A G occurs, then by the definition of Comp(o, b), there are unit vectors v,v' such that 
||M„w|| < n _ ( 2j4+1 ) 7 and v' has support on at most an coordinates and \\v — v'\\ < b. 

By the triangle inequality and (16) we have 



A set AT of unit vectors in C m is called a 5-net if for any unit vector v, there is a vector vu in Af such 
that \\v — w\\ < S. It is well known that for any < 6 < 1, a 5-net of size (CS^ 1 ) m exists, for some 
constant C independent of 5 and to. 

Using this fact, we conclude that the set of unit vectors with at most an non-zero coordinates admits 
an 6-net Af of size at most 



P(F Comp AG) < exp(-Q(n)). 
Lemma 7.2 (Incompressible vector bound). We have 

P(FlncompAG) <n- A +°^. 



8. Treatment of compressible vectors 



\\M n v'\\ < n 



(2A+1) 7 



+ ||M n ||||i,-i/|| 



< 2f3n 1 / 2 . 




Thus, if Fcomp A G occurs, then there is a unit vector v' 



>" e Af such that 



M n v"\\ < 2[3n 1 / 2 + \\M n \\b 



On the other hand, from Theorem 6.4 we see (for fi < c/3) that for any fixed v" , 

P(||M n «"|| < 3/3n 1/2 ) < exp(-c'n), 
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where c and c' are the constants in Theorem 6.4. 
By the union bound, we conclude 



P^CompAG) < ^(6- 1 ) a "cxp(-c'n). 



But from (16), (17) we see that the right-hand side can be made less than exp(— c'n/2), given that (3 
is sufficiently small. This concludes the proof of Lemma 7.1. 



9. Treatment of incompressible vectors 

We now begin the proof of Lemma 7.2. We now fix j3 and allow all implied constants to depend on (3. 

Let Xk be the fc th row vector of M n , and let distfc be the distance from Xk to the subspace spanned 
by Xi, . . . , Xk-i,Xk+i, . . . , X n . We need the following, which is a slight extension of a lemma from 
[21]. 

Lemma 9.1. For any e > 0, and any event E, we have 

1 " 

P({||Mu|| < ebn- 1/2 for some v E Incomp(a, b)} A E) < — ^ P({dist fc < e} A E). 

an fe=i 

Proof. See [21, Lemma 3.5]. The arbitrary event E was not present in that lemma, but one easily 
verifies that the proof works perfectly well with this event in place. □ 

Applying this to our current situation with 

(18) e := in- 2 ^, 
we obtain 

1 n 

P(Fl„comp A G) « J2 P ({ di8t * < £ i A G )- 

n fe=l 

To prove Lemma 7.2, it therefore suffices (by symmetry) to show that 

P({dist„ < £ }AG)« n- A+a+ °^. 
Notice that there is a unit vector orthogonal to X\, . . . , X„_i such that 

(19) dist fe = \X n -X*\. 

If there are many such X*, choose one arbitrarily. However, note that we can choose X* to depend 
only on X\, . . . , X„_i and thus be independent of X n . 

Let p :— n~ A+a . Let X be the random vector of length n whose coordinates are iid copies of x. From 
Definition 6.6 (and the observation that X n has the same distribution as X after translating by a 
deterministic vector (namely the nth row of the deterministic matrix M), we have the conditional 
probability bound 

P(dist„ < e\X* $ S e , p ) <p = n- A+a . 
Thus it will suffice to establish the exponential bound 



P({X* n e S E . P } AG) < exp(-fi(n)). 
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Let 

(20) J := [2A\ 

be the integer part of 2A. Let a\ > be a sufficiently small constant (independent of n and 7, but 
depending on a, A, J) to be chosen later. Set 

(21) ei := n^ +ai ^e = -n^ +ai ^ ' n - 2A ~< 

P 

and 

(22) Pj := n^-^p = n (.^- ai )j n -A+ a 
for all < j < J. 

By the union bound, it will suffice to prove the following lemmas. 
Lemma 9.2. If a\ is sufficiently small, then for any < j < J, we have 

(23) P({X* n G S Ej>Pj } A {X: t S £j+uPj+1 } AG) < exp(-fi(n)). 
Lemma 9.3. If a\ is sufficiently small, then we have 

P(X* e S ejlPJ ) < exp(-n(n)). 

10. Proof of Lemma 9.2 
Fix < j < J. Note that by (14), we have 

Pj < n (J -^/ 2 n- A+a < n -i/2-{2A}/ 2+Q < n -i/2_ 

We can then use Theorem 6.8 to conclude the existence of a set Af of unit vectors such that every 
vector in S £jtPj lies within Ej in l°° norm to a vector in Af, and with the cardinality bound 

(24) \Af\ < n- (1 / 2 - W> n pJ n . 

Suppose that the event in Lemma 9.2 holds, then we can find u £ Af such that ||u — X*||;oo < sj, and 
thus — X*\\ < n x / 2 ej. On the other hand, since X* is orthogonal to X\, . . . , X n _i and ||M n || <C n 1 , 
we have 

(l:^-«i 2 ) i/2 =(Ei^-^-^)i 2 ) i/2 

i=l »=1 

= \\M(u-X* n )\\ 

<n 1 / 2 n- ai e J -+i. 
On the other hand, from (23) and Definition 6.6 we have 

(25) P(\X ■ X* n - z\ < e j+1 ) < p j+1 
for all z e C, where X = (x\, . . . , ir„) consists of iid copies of x. 

To conclude the proof, we will need the following lemma. 
Lemma 10.1. If w is any vector with \\w\\i^ < 1, then 

P(\X -w\ >n~< +ai ) <^n- A . 
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Proof. Write w = (uii, . . . ,w n ) and X = (xi, . . . , x n ). Observe from (13) that with probability 
0(n^ A ^ 1 ) = 0(n~ A ), all the coefficients in X are going to be of magnitude at most n 1 . Thus it 
suffices to show that 

Pflwiii + . . . + w n x n \ > n 7+Q1 ) < rT A 

where iid with law equal to that of x conditioned to the event \x\ <C n 7 . As x has mean 

zero and bounded second moment, one verifies from (13) and Cauchy-Schwarz that the mean of the 
Xi is 0(n~( A+2 ^ 2 ). Thus if we let x\ :— x { — E(£j), we see that it suffices to show that 

P(\w 1 x' 1 + ... + w n x' n \ > in 7+ai )«™~ A 



We conclude the proof by the moment method, using the following estimate 

E(\w 1 x' 1 + ... + w n x' n \ 2k )« k n 2k ^ 

for any integer k > 0. This is easily verified by a standard computation (using the hypothesis 
7 > 1/2), since all the x\ have vanishing first moment, a second moment of O(l), and a j th moment 
of Oj(n^ -2 ^ 7 ) for any j > 2. Now take A; to be a constant sufficiently large compared to Aja.\. □ 



We are now ready to finish the proof of Lemma 9.2. From lemma 10.1 and the bound \u — < Ej 
we see that 

P(\X ■ (X: - u)\ > e j+1 ) < n- A < p j+1 ; 
combining this with (25) using the triangle inequality, we see that 

(26) supP(|A • u - z\ < e j+1 ) < p j+1 . 

zee 

We can therefore bound the left-hand side of (23) by 

n-l 

£ p((^|X i . U | 2 ) 1/2 «n 1 /2 n -- e . +1 ). 
ueAA:(26) holds i=1 

Now suppose that u G Af obeys (26). If we have Y^ii=i \Xi ■ u] 2 ) 1 / 2 <C r^^nT^Ej+i, then the event 
\Xi ■ u\ < Ej+i must hold for at least n — 0(n 1_2Q1 ) values of i. On the other hand, from (26) we see 
that each of these events \Xi ■ u\ < Ej+i only occurs with probability 0{pj+\). We can thus bound 

p£ i* • -i 2 ) 1/2 « ^n-^ +l) < ( n _ 0( ^_ 2ai) j (o {Pj+1 )r-°^ 

« n°Mp? +1 . 

Applying (24), we can thus bound the left-hand side of (23) by 

< n -(l/2-o(l))" (0 -n (0 r i+i = n -( ai -o(l)) n 

and the claim follows. 



11. Proof of Lemma 9.3 



Suppose that X* lies in S SJtPJ . Then by Lemma 6.7, we have 



X r ;cComp(0(-L),0(^)). 

rl Pj PJ 



Note from (22) and (20) that 

1 



T =n 2A - J - 1+2QlJ - 2Q <n- ai 
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if a\ is sufficiently small. Thus, by arguing as in Section 8, the set Comp(0(^-),0(^j)) has a 
0(^-)-net TV in I 2 of cardinality 



V no 2 , ' 



-np-j 

If we let u e Af be within O(jfi) of X*, then we have \Xi ■ u\ -C ^ for all 1 < i < n — 1. Thus we can 
bound 



=exp(o(n)). 

P./ 



P(X* G 5 £J , PJ ) < ^ P(|JQ • u| < — for all 1 < i < n - 1). 
Now observe from (21), (22), (20) and the hypothesis 7 > 1/2 that 

££ = n -a+2aiJ n -(2A-J)( 7 -l/2) < 
PJ 

(say) if ai is sufficiently small. Thus by Corollary 6.3 (or by a minor modification of Theorem 6.4) 
we see that 

P(\Xi ■ u\ <C — for all 1 < i < n - 1) < exp(-fi(n)) 
PJ 

for each u G Af, and the claim follows. 
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